The Democratization of Big Data

نویسنده

  • Sean Fahey
چکیده

In recent years, it has become common for discussions about managing and analyzing information to reference “data scientists” using “the cloud” to analyze “big data.” Indeed these terms have become so ubiquitous in discussions of data processing that they are covered in popular comic strips like Dilbert and the terms are tracked on Gartner’s Hype cycle. The Harvard Business Review even labeled data scientist as “the sexiest job of the 21st century.” The goal of this paper is to demystify these terms and, in doing so, provide a sound technical basis for exploring the policy challenges of analyzing large stores of information for national security purposes. It is worth beginning by proposing a working definition for these terms before exploring them in more detail. One can spend much time and effort developing firm definitions for these terms – it took the National Institutes of Science and technology several years and sixteen versions to build consensus around the definition of cloud computing in NIST Special Publication 800-145 – the purpose here is to provide definitions that will be useful in furthering discussions of policy implications. Rather than defining big data in absolute terms (a task made nearly impossible by the rapid pace of advancements in computing technologies) one can define big data as a collection of data that is so large that it exceeds one’s capacity to process it in an acceptable amount of time with available tools. This difficulty in processing can be a result of the data’s volume (e.g., its size as measured in petabytes), its velocity (e.g., the number of new data elements added each second), or its variety (e.g., the mix of different types of data including structured and unstructured text, images, videos, etc . . . ). Examples abound in the commercial and scientific arenas of systems managing massive quantities of data. YouTube users upload over one hundred hours of video every minute, Wal-Mart processes more than one million transactions each hour, and Facebook stores, accesses and analyzes more than thirty petabytes

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How 5G (and concomitant technologies) will revolutionize healthcare

In this paper, we build the case that 5G and concomitant emerging technologies (such as IoT, big data, artificial intelligence, and machine learning) will transform global healthcare systems in the near future. Our optimism around 5G-enabled healthcare stems from a confluence of significant technical pushes that are already at play: apart from the availability of highthroughput low-latency wire...

متن کامل

Big Data Quality: From Content to Context

Over the last 20 years, and particularly with the advent of Big Data and analytics, the research area around Data and Information Quality (DIQ) is still a fast growing research area. There are many views and streams in DIQ research, generally aiming at improving the effectiveness of decision making in organizations. Although there are a lot of researches aimed at clarifying the role of BIG data...

متن کامل

Human-Algorithm Interaction Biases in the Big Data Cycle: A Markov Chain Iterated Learning Framework

Early supervised machine learning algorithms have relied on reliable expert labels to build predictive models. However, the gates of data generation have recently been opened to a wider base of users who started participating increasingly with casual labeling, rating, annotating, etc. The increased online presence and participation of humans has led not only to a democratization of unchecked in...

متن کامل

Adaptive Caching Algorithms for Big Data Systems

Today’s Big Data platforms have enabled the democratization of data by allowing data sharing among various data processing frameworks and applications that run in the same platform. This data and resource sharing, combined with the fact that most applications tend to access a hot set of the data has led to the development of external, in-memory, distributed caching frameworks. In this paper, we...

متن کامل

Perspectives of Big Data Quality in Smart Service Ecosystems (Quality of Design and Quality of Conformance)

Despite the increasing importance of data and information quality, current research related to Big Data quality is still limited. It is particularly unknown how to apply previous data quality models to Big Data. In this paper we review Big Data quality research from several perspectives and apply a known quality model with its elements of conformance to specification and design in the context o...

متن کامل

How 5G Wireless (and Concomitant Technologies) Will Revolutionize Healthcare?

The need to have equitable access to quality healthcare is enshrined in the United Nations (UN) Sustainable Development Goals (SDGs), which defines the developmental agenda of the UN for the next 15 years. In particular, the third SDG focuses on the need to “ensure healthy lives and promote well-being for all at all ages”. In this paper, we build the case that 5G wireless technology, along with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014